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9.1.5 Mean Squared Error (MSE) 

Suppose that we would like to estimate the value of an unobserved random variable X given 
that we have observed Y = y. In general, our estimate x is a function of y: 

x = k r 0’). 


The error in our estimate is given by 


= X~g(y ). 

Often, we are interested in the mean squared error (MSE) given by 

E[(X- x) 2 | 7 = y] = E[(X- g(y)) 2 \Y = y]. 

One way of finding a point estimate Jc = g(y) is to find a function g(Y) that minimizes the 
mean squared error (MSE). Here, we show that g(y) = E[X\ Y = y] has the lowest MSE 
among all possible estimators. That is why it is called the minimum mean squared error 
(MMSE) estimate. 

For simplicity, let us first consider the case that we would like to estimate X without observing 
anything. What would be our best estimate of X in that case? Let a be our estimate of X. Then, 
the MSE is given by 


h(a) = E[(X- a) 2 ] 

= EX 2 - 2aEX+ a 2 . 

This is a quadratic function of a, and we can find the minimizing value of a by differentiation: 

h ( a ) = - 2EX + 2a. 

Therefore, we conclude the minimizing value of a is 


a = EX. 


Now, if we have observed Y = y, we can repeat the above argument. The only difference is 
that everything is conditioned on Y = y. More specifically, the MSE is given by 

h(a) = E[(X- a) 2 \Y = y] 

= E[X 2 | Y = y] - 2aE[X\ Y=v] + a 2 . 
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Again, we obtain a quadratic function of a, and by differentiation we obtain the MMSE 
estimate of X given Y = y as 


x M = E[X\Y = y]. 


Suppose that we would like to estimate the value of an unobserved random variable X, by 
observing the value of a random variable Y = y. In general, our estimate x is a function of v, so 
we can write 


X = g(Y). 

Note that, since 7 is a random variable, the estimator X= g(Y) is also a random variable. The 
error in our estimate is given by 


= X~g(Y), 

which is also a random variable. We can then define the mean squared error (MSE) of this 
estimator by 


E[(X-X) 1 ]=E[(X-g(Y)) 1 l 

From our discussion above we can conclude that the conditional expectation X M = E\X\ Y] has 

the lowest MSE among all other estimators g(Y). 

Mean Squared Error (MSE ) of an Estimator 

Let X = g(Y) be an estimator of the random variable X, given that we have observed the 
random variable Y. The mean squared error (MSE) of this estimator is defined as 

E[(X-X) 2 ]=E[(X~g(Y)) 2 ]. 


The MMSE estimator of A, 


X m = E[X\Y], 

has the lowest MSE among all possible estimators. 

Properties of the Estimation Error: 

Here, we would like to study the MSE of the conditional expectation. First, note that 
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E[X m \ = E[E[X I 7]] 

= E\X\ (by the law of iterated expectations). 

Therefore, X M = E[X\ Y] is an unbiased estimator ofX. In other words, lor X M = E[X\ Y ], the 
estimation error, X, is a zero-mean random variable 

E[X] = EX- E[X M ] = 0. 

Before going any further, let us state and prove a useful lemma. 

Lemma 9.1 

Define the random variable W = E[X\ Y], Let X M = E\X\ Y] be the MMSE estimator ol'X 
given Y, and letX = X- X M be the estimation error. Then, we have 

a. W=0. 

b. For any function g(T), we have E[X ■ g(7)] = 0. 

Proof: 

a. We can write 

W=E[X | Y] 

= E[X-X m \Y] 

= E[X\Y]-E[X m \Y] 

= X m -E[X m \Y] 

= X M -X M = o. 

The last line resulted because X M is a function of Y, so E[X M \ Y] = X M . 

b. First, note that 

E[X-g(Y)\Y]=g(Y)E[X\Y] 

= g(Y)-W=0. 

Next, by the law of iterated expectations, we have 

E[X-g(Y)\=E[E[X-g(Y)\Y]]=0. 


https://www.probabilitycourse.com/chapter9/9_1_5_mean_squared_error_MSE.php 


3/7 



9/18/2018 


Mean Squared Error (MSE) 


We are now ready to state a very interesting property of the estimation error for the MMSE 
estimator. Namely, we show that the estimation error, X, and X M are uncorrelated. To see this, 
note that 


Cov(i, X M ) = E[X ■ x M \ - E[X]E[X m \ 

= E[X ■ X M ] (since E[X] = 0) 

= E[X • g(T)] (since X M is a function of 7) 

= 0 (by Lemma 9.1). 

Now, let us look at Var(X). The estimation error is X = X~ X M , so 

X=X+X M . 

Since Cov(X, X M ) = 0, we conclude 

Var(X) = Var(X M ) + Var(l). (9.3) 

The above formula can be interpreted as follows. Part of the variance of X is explained by the 
variance in X M . The remaining part is the variance in estimation error. In other words, if X M 

captures most of the variation inX, then the error will be small. Note also that we can rewrite 
Eq uation 9.3 as 

E[X 2 ] - E[X ] 2 = E[X m \ - E[X m ] 2 + E[JT] - E[X] 2 . 

Note that 

E[X m \ = E[X], E[X] = 0. 

We conclude 

E[X 2 ]=E[xi]+E[JT]. 

Some Additional Properties of the MMSE Estimator 

- The MMSE estimator, X M = E[X\ Y ], is an unbiased estimator of X, i.e., 

E[X m ] = EX, E[X] = 0. 
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- The estimation error, X, and X M are uncorrelated 


Cov(X, X M ) = 0. 


- We have 


Var(X) = Var(X M ) + Var(X), 

E[X 2 ]=E[X m ]+E[X 2 ]. 

Let us look at an example to practice the above concepts. This is an example involving jointly 
normal random variables. Thus, before solving the example, it is useful to remember the 
properties of jointly normal random variables. Remember that two random variables X and Y 

are jointly normal if aX + bY has a normal distribution for all a, b E R. As we have seen 

2 2 

before, if X and Y are jointly normal random variables with parameters p x , o x , p Y , a Y , and p, 
then, given Y = y,X is normally distributed with 


y~ b Y 

E[X | Y = y] = p x + pa x -, 

Oy 

Var(X| T = y) = (1 - p 2 )ox- 


Example 9.7 

LetX~ N(0, 1) and 


Y = X+W, 


where W ~ N{ 0, 1) is independent ofX. 

a. Find the MMSE estimator ofXgiven Y, (X M ). 

A 

b. Find the MSE of this estimator, using MSE = E[(X — X M ) 2 \. 

c. Check that E[A 2 ] = E[X M \ + /:[W]. 

• Solution 

o Since X and W are independent and normal, Y is also normal. Moreover, X and Y 
are also jointly normal, since for all a, b E R, we have 

aX+bY= (a + b)X+bW , 

which is also a normal random variable. Note also, 
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Cov(X, Y) = Cov(X, X + W) 

= Cov(X, X) + Cov(X, W) 
= Var(X) = 1. 


Therefore, 


P(X,Y) 


Cov(X, Y) 

a X°Y 

1 1 


1 • a/2 a/ 2 ' 


a. The MMSE estimator of X given Y is 

X m = E[X\Y] 


Y [iy 


Px + P°x~ 

Y 

2 ' 


b. The MSE of this estimator is given by 


E[(X-X M f]=E 


\X- 


Y\2 


X Z ~XY + 


Y l 


EX 2 - E[X(X+ W )] + 

EX 2 - EX 2 - EXEW + 

Var(7) + (EY) 2 
- 

2 + 0 1 
4 * 2' 


EY 1 


4 

EY 2 


c. Note that E[X 2 ~\ = 1. Also, 
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,2 EY 2 1 
E[X m ] = — = 

~ 2 1 

In the above, we also found MSE = is[X ] = -. Therefore, we have 

E[X 2 ]=E[X m \+E[X 1 ]. 


<— previous 
next —> 
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